Creating a Handwriting Recognition Corpus for Bushman Languages
نویسندگان
چکیده
Handwriting recognition systems rely on the existence of a corpus for training recognition models and evaluating accuracy. Creating a handwriting recognition corpus for the Bushman languages of southern Africa is difficult due to the complexities of the script used to represent them and the fact that this script cannot be represented using Unicode. To solve this problem, a semi-automatic Web-based tool was developed to segment, capture and encode the Bushman text. A case study demonstrated how the tool could be used to create a Bushman handwriting corpus with few errors.
منابع مشابه
Learning to Read Bushman: Automatic Handwriting Recognition for Bushman Texts
The Bleek and Lloyd Collection contains notebooks that document the tradition, language and culture of the Bushman people who lived in South Africa in the late 19th century. Transcriptions of these notebooks would allow for the provision of services such as textbased search and text-to-speech. However, these notebooks are currently only available in the form of digital scans and the manual crea...
متن کاملCorpus and Evaluation of Handwriting Recognition of Historical Genealogical Records
Over the last few decades, significant strides have been made in handwriting recognition (HR), which is the automatic transcription of handwritten documents. HR often focuses on modern handwritten material, but in the electronic age, the volume of handwritten material is rapidly declining. However, we believe HR is on the verge of having major application to historical record collections. In re...
متن کاملEvaluation of Handwriting Recognition Systems for Application to Historical Records
In the last decade, significant, largely-governmental funding has been applied to the automatic transcription of handwritten documents. Uses for this kind of technology are somewhat limited given that the numbers of handwritten documents are on the decline. However, certain types of handwritten historical records can be crucial for genealogical research in that they identify key vital facts. In...
متن کاملAtwell 96 a
Geoffrey Leech’s ideas have been inspirational both to Corpus-based computational linguists in general, and to me personally: first as a student and Researcher Associate at Lancaster University, then as a Lecturer in Artificial Intelligence at Leeds University. This chapter focuses on research at Lancaster and Leeds building on Geoffrey Leech’s ideas, looking in particular at how corpus resourc...
متن کاملRecognition of Myanmar Handwriting Text Based on Hidden Markov Model
Handwriting recognition is one of the most challenging tasks and exciting areas of research in computer vision. Numerous document recognition methods have been proposed in various languages and character set such as Arabic, India, Korean, Japanese, Chinese and so on. This paper presents the recent result of the research work of Myanmar handwriting text recognition and translation. Each segmente...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011